Aspose For Hadoop
=================

Aspose for Hadoop project enables Hadoop developers to work with binary file formats. The Hadoop / MR developers can use this project to create and convert binary sequence files into text sequence files. The text can then be used for analysis purpose in MapReduce algorithms.

Packages
--------
com.aspose.hadoop.core

      Provides Aspose for Java wrapper classes to parse binary formats into text. The package also includes a couple of classes to override Hadoop input formats so as to be used for creating binary sequence files.
com.aspose.hadoop.examples 
      
      Provides mapper examples for converting binary sequence file(s) into text sequence file(s). Each mapper example takes a particular set of binary format as exaplained in the next section.

Mapper Examples Flow and Usage
------------------------------

CreateBinarySequence

      Picks up the set of files from an HDFS directory, create binary sequence file(s) and stores the binary sequence file(s) to an HDFS directory.
      Usage: [HADOOP_HOME]$ bin/hadoop jar aspose-hadoop.jar com.aspose.hadoop.examples.CreateBinarySequence <HDFS input directory> <HDFS output directory>
CreateDocumentTextSequence

      Picks up binary sequence file(s) generated by documents (MS Words / OpenOffice docs) from an input HDFS directory, parses text from the documents, creates text sequence(s) file to be stored on an output HDFS directory.
      Usage: [Hadoop_HOME]$ bin/hadoop jar aspose-hadoop.jar com.aspose.hadoop.examples.CreateDocumentTextSequence <HDFS input directory> <HDFS output directory>
      Tip: Put your documents in an HDFS directory, use CreateBinarySequence mapper to generate binary sequence file(s). Finally supply output directory of the mapper as an input here.
CreateSpreadSheetTextSequence

      Picks up binary sequence file(s) generated by spreadsheets (MS Excel / OpenOffice spreadsheets) from an input HDFS directory, parses text from the spreadsheets, creates text sequence file(s) to be stored on an output HDFS directory.
      Usage: [Hadoop_HOME]$ bin/hadoop jar aspose-hadoop.jar com.aspose.hadoop.examples.CreateSpreadSheetTextSequence <HDFS input directory> <HDFS output directory>
      Tip: Put your spreadsheets in an HDFS directory, use CreateBinarySequence mapper to generate binary sequence file(s). Finally supply output directory of the mapper as an input here.
CreatePresentationTextSequence

      Picks up binary sequence file(s) generated by presentations (MS PowerPoint PPTX presentations) from an input HDFS directory, parses text from the presentations, creates text sequence file(s) to be stored on an output HDFS directory.
      Usage: [Hadoop_HOME]$ bin/hadoop jar aspose-hadoop.jar com.aspose.hadoop.examples.CreatePresentationTextSequence <HDFS input directory> <HDFS output directory>
      Tip: Put your PPTX presentations in an HDFS directory, use CreateBinarySequence mapper to generate binary sequence file(s). Finally supply output directory of the mapper as an input here.
CreateEmailTextSequence

      Picks up binary sequence file(s) generated by emails (msg emails) from an input HDFS directory, parses text from the msg files, creates text sequence file(s) to be stored on an output HDFS directory.
      Usage: [Hadoop_HOME]$ bin/hadoop jar Aspose-hadoop.jar com.aspose.hadoop.examples.CreateEmailTextSequence <HDFS input directory> <HDFS output directory>
      Tip: Put your msg files in an HDFS directory, use CreateBinarySequence mapper to generate binary sequence file(s). Finally supply output directory of the mapper as an input here.

Aspose Pty Ltd: www.aspose.com
